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ABSTRACT 


Performance  measures  are  derived  for  data-adaptive  hypothesis  testing  by  sys¬ 
tems  trained  on  stochastic  data.  The  measures  consist  of  the  averaged  performance 
of  the  systems  over  an  ensemble  of  training  sets.  The  uncertainties  derivable  from 
training  sets  represent  an  irreducible  uncertainty  inherent  in  the  learning  procedure. 
Data-adaptive  system  estimates  are  contrasted  with  classical  hypothesis  testing,  in 
which  optimum  tests  are  based  on  an  assumed  data  model.  In  addition,  a  per¬ 
formance  estimate  for  the  maximum  a  posteriori  probability  (MAP)  A^-hypothesis 
test  is -derived  based  on  a  neural-net  formulation  of  the  test.  The  performance  of 
adaptive  systems  on  a  binary  test  of  uniformly  distributed  data  is  compared  with 
the  data-adaptive  and  MAP  estimates.  The  adaptive  systems  considered  are  linear 
extrapolation  from  data  (LINEXT)  and  a  back-propagation  neural  net  (BPNN). 
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1.  INTRODUCTION 


Hypothesis  testing  by  a  data-adaptive  system  is  fundamentally  different  from  classical  hy¬ 
pothesis  testing.  In  the  former,  a  representative  data  set  corresponding  to  known  hypotheses  is 
used  to  train  the  system.  System  parameters  are  varied  until  the  system  training  set  to  hypothesis 
space  mapping  best  approximates  the  known  map.  Two  assumptions,  a  sufficiently  representative 
training  set  and  the  ability  of  the  system  to  associate,  are  required  to  extend  the  map  to  arbitrary 
data  (!].  In  contrast,  classical  hypothesis  testing  derives  from  an  assumed  model  for  the  data,  often 
a  signal  in  Gaussian  noise,  from  which  optimum  tests  are  defined  [2). 

In  this  report,  performance  mceisures  are  derived  based  only  on  the  procedure  by  which  an 
adaptive  system  is  trained.  We  assume  that  if  a  system  is  perfectly  trained  on  a  representative  data 
set  for  each  hypothesis,  an  appropriate  performance  estimate  is  the  averaged  performance  over  the 
ensemble  of  training  sets.  This  averaged  performance,  which  is  computed  in  terms  of  training-set 
size  and  data  distributions,  reflects  an  uncertainty  inherent  in  the  learning  procedure. 

A  data-adaptive  system  of  particular  interest  is  the  neural  net.  Relative  to  the  now-convention¬ 
al  neural-net  taxonomy  [1,3],  we  will  consider  only  the  back-propagation  mapping  network  [4-7]. 
This  network  adapts  internal  parameters  toward  the  approximation  of  a  functional  mapping,  which 
for  hypothesis  testing  is  the  data  input  to  hypothesis  space  output  map.  Alternative  neural-net 
architectures,  such  as  those  employing  Kohonen  learning  [8],  attempt  to  store  data  distributions 
internally  rather  than  to  approximate  a  map  to  the  hypothesis  space.  Neural-net  classifiers  gener¬ 
ally  perform  as  well  as  conventional  techniques  on  a  variety  of  problems  including  linear,  Gaussian, 
and  /c-nearest-neighbor  algorithms  [3],  [9-12].  Neural  nets  have  also  been  configured  *o  perform 
maximum  a  posteriori  probability  (MAP)  [13]  and  maximum  likelihood  tests  [14]  for  arbitrary  input 
distributions. 

In  Section  3,  training-set-based  performance  measures  are  derived  for  a  data-adaptive  system 
on  an  arbitrary  data-based  iV-hypothesis  test.  A  MAP  test  is  also  formulated  and  repre-sented  in 
a  neural-net  structure.  A  possible  neural-net  representation  of  the  MAP  test  contains  N  output 
neurons  (processing  elements).  For  a  net  input  x,  the  fth  neuron  outputs  p(//,[x)e[0, 1],  which  is  the 
conditional  probability  for  hypothesis  H,,  i  =  The  training-set-based  and  MAP  estimates 

are  applied  in  Section  3  to  a  binary  hypothesis  test  on  uniformly  distributed  data.  These  measures 
are  compared  to  the  computed  performance  of  adaptive  systems  such  as  a  linear  extrapolation 
from  the  training  set  and  a  back-propagation  neural  net.  Linear  extrapolation  (LINEXT)  simply 
chooses  the  hypothesis  of  the  nearest  neighbor  to  the  input,  whereas  a  back-propagation  neural  net 
(BPNN)  is  trained  to  minimize  the  summed  difference  between  net  outputs  and  targets  over  the 
training  set  [4].  Both  tests  are  data-adaptive  in  that  the  algorithms  are  defined  using  a  training 
set.  Section  3  shows  that  systems  trained  to  the  exact  training-set  map  most  closely  match  the 
training-set-based  estimates.  These  systems  are  contrasted  with  systems  trained  on  data  biases, 
which  are  better  approximated  by  Bayesian  performance. 
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2.  ADAPTIVE-SYSTEM  PERFORMANCE  MEASURES 


In  this  section  two  performance  measures  are  defined  for  data-adaptive  systems:  the  training- 
set-based  estimate,  in  which  the  statistics  of  the  training  set  determine  the  performance,  and  the 
MAP  test  estimate.  The  MAP  hypothesis  test  is  formulated  with  an  assumed  neural-net  structure 
for  the  data  input  to  hypothesis  space  output  map. 

2.1  Training-Set-Based  Measures 

In  this  subsection  the  performance  of  an  adaptive  system  is  approximated  from  the  statis¬ 
tics  of  the  training  set.  Consider  the  training  of  an  adaptive  system  for  the  testing  of  hypotheses 
Hi,..., with  prior  probabilities  p{Hi), i  =  1,...,N.  The  prior  probabilities  are  normalized  to 
unity  by  the  condition  =  1-  The  input  to  the  system  is  the  data  value  xeH,  which 

is  obtained  by  the  observation  of  stochastic  phenomena  reflecting  the  set  of  possible  hypotheses. 
We  denote  the  operation  of  observing  the  phenomena,  OBS,  from  which  the  data  value  x  is  ob¬ 
tained.  The  OBS-generated  data  value  x  is  input  to  the  adaptive  system,  which  has  an  output 
u  =  (ui, . . . ,  u;v))  with  Uj  nonzero  corresponding  to  hypothesis  Hj,j  =  1,...,N.  Figure  1  contains 
a  schematic  of  the  OBS  and  adaptive  system  operations. 

177104-1 

OUTPU’’ 

ti) 

(0 . 0,1,0...,  0) 

FOR  HYPOTHESIS 
Hj,j  =  1 . N 


HYPOTHESIS  DATA  VALUE  x 

Hi,i  =  1,...,  N  DISTRIBUTION 

p(x|Hi),i  =  1 . N 

Figure  1.  Schematic  of  the  OBS  and  adaptive  system  operations.  Hypotheses  i  = 
1, . . . ,  iV,  OBS  output  X,  neural-net  output  u. 
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The  data  value  x  is  assumed  to  have  a  conditional  probability  distribution  p{x\Hi),i  = 
1, ...jiV,  with  hypothesis  Hi.  More  specifically,  the  function  p{x\Hi)  is  the  probability  density 
that  the  OBS  operation  outputs  x  for  phenomena  satisfying  hypothesis  Hf  The  densities  are  nor¬ 
malized  to  unity,  J.pp{x\Hi)  dx  =  1,  where  X>  C  72.  is  the  region  of  allowed  x- values.  The  adaptive 
system  is  trained  on  the  sets  {x}, . . . , xj^^^ {x^, . . . , x^^ }  of  OBS  data  outputs  for  each  hy¬ 
pothesis  This  training  set  results  from  Mi  trials  of  OBS  with  hypothesis  Hi,  Mq  trials 

of  OBS  with  hypothesis  H2,  continuing  to  trials  of  OBS  with  hypothesis  H^.  The  system  is 
trained  to  exactly  perform  the  mapping 


(1) 


A  measure  of  system  errors  due  to  inherent  training-set  ambiguities  is  obtained  from  the 
performance  on  the  training  set  {x}, . . . ,x]^||}, . . . , {x(^, . . .  ,xjy^}.  This  intuitively  represents  an 
upper  bound  on  averaged  system  performance  because  added  errors  generally  occur  due  to  incorrect 
system  association  on  arbitrary  data.  To  compute  the  training-set-based  measures,  it  is  assumed 

that  Ml  H - 1- A'/iv  trials  of  OBS  result  exactly  in  the  data  set  {xj, . . .  ,x]^;j}U  -  •  •u{xi^, . . . 

For  a  given  data  point  xl,i  =  I, . . .  ,Mj,j  =  1,. . .  ,N,  the  probability  of  having  been  generated  by 
hypothesis  Hk,k  =  l,...,N,\s  given  by 


Prob(xj,Hk) 


p(Hk)p(xilHk) 

ZSLiPiHM^jiH,) 


(2) 


where  the  normalization  is  over  the  hypotheses  which  could  have  generated  xj  in  the  A/i  4 - 1-  A/^r 

trials.  The  system  maps  x^  to  hypothesis  Hj  so  that  the  probability  in  Equation  (2)  contributes  to 
the  situation  of  a  system  declaration  for  hypothesis  H^  when  the  true  hypothesis  is  Hf..  Therefore, 

over  the  set  of  A/i  -I - 1-  A/^v  trials  of  OBS,  the  number  of  Hj  declarations  for  true  hypothesis  Hk 

is  given  from  Equation  (2)  by 


Mi 

NUM{Hi,Hk)  =  YlProb(xi,Hk) 

i=l 

^  pWp(x!IHj,) 


The  probability  of  a  system  declaration  of  Hj  for  true  hypothesis  Hk  is  then  given  by  (j,k  = 
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PiHj.Ht) 


1  ^  p(tft)p(xj'|gt) 

EjLi  Mp  a  Ei''=iP(^,)pWiif,) 


(4) 


Note  that  the  required  normalization  for  the  A/jH - J-A/jv  trials,  iffc)  =  A/j/  ^pi 

follows  from  Equation  (4).  We  now  consider  the  average  oip{Hj,  Hk)  over  the  ensemble  of  training 
sets  obtained  by  the  above  procedure.  Recall  that  a;^  in  Equation  (4)  was  obtained  by  the  OBS 
operation  with  a  fixed  hypothesis  Hj,  indicating  that  the  appropriate  distribution  for  x\  isp(x^|i?j). 
Averaging  over  the  values  of  xj  in  Equation  (1),  we  obtain  an  averaged  probability  for  hypothesis 
Hj  declared  with  the  true  hypothesis  Hk, 


{p{Hj, Hk))  =  wiHk)Pj,k,  j,k  =  l,...,N 


(5) 


where 


Pj,k  = 


r  pix\Hj)p{x\Hk) 


(6) 


and  7j  is  the  proportion  of  hypothesis  ^/j-generated  data  in  the  training  set  for  the  adaptive  system 


7;  = 


(7) 


The  joint  probability  in  Equation  (5)  has  factored  into  a  training-ensemble-dependent  parameter  7^ 
and  a  statistics-dependent  quantity  p{Hk)P},k-  An  estimate  of  the  conditional  probability  p{H,\Hj), 
corresponding  to  a  decision  for  H  with  true  hypothesis  Hj,  is  obtained  from  Equation  (5)  by 


piHilHj) 


{p{Hi,Hj)) 

ZH^MH^Hj)) 

liPi,} 

Eq=i  'yqPq.j 


(8) 


where  7,  and  Pxj  are  given  in  Equations  (5)  and  (6),  respectively.  Equations  {5)-(8)  are  denoted 
the  training-set-based  measures  of  system  performance  in  the  following  sections. 


2.2  Neural-Net  MAP  Test  Measures 


A  more  traditional  approach  to  system  performance  estimation  is  through  the  MAP  test 
[2|.  For  an  OBS-generated  input  x,  the  hypothesis  Hj  is  chosen,  which  maximizes  the  conditional 
probability  p{Hk\x)  k  =  1, . . .  ,N.  A  neural  net  trained  on  sufficiently  representative  data  has  been 
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shown  to  converge  to  the  MAP  test  performance  [13).  A  mapping  network  for  the  A-hypothesis 
test  consists  of  a  single  OBS-generated  input  a:,  a  series  of  hidden  layers,  and  an  A^-neuron  output 
layer.  A  stochastic  formulation  of  a  MAP  test  neural  net  allows  a  comparison  with  the  training- 
set-based  estimates  in  Equations  (5)-(8).  The  N  deepest  layer  neurons  are  assumed  to  output 

i 

only  0  or  1  in  the  pattern  (0, . . . ,  0,  1  , 0, . . . ,  0),  i  =  1, . . . ,  A’,  with  probability  qi{x)  for  input  x. 

i 

The  output  A- vector  (0, . . . ,  0,  1  , 0, . . . ,  0)  corresponds  to  a  decision  for  hypothesis  Hi.  The  net 
output  probabilities  are  normalized  by  the  condition  Qj{x)  =  1,  xeV.  The  joint  probability 
p{Hj,Hk\x)  for  choosing  hypothesis  Hj  with  phenomena  satisfying  Hk,  assuming  net  input  x,  is 
given  by  the  product  qj{x)p{Hk\x).  The  average  over  input  values  x  with  a  prior  distribution  p(x) 
yields 


=  [  qj(x)p{Hk\x)p{x)dx 
Jn 


(9) 


The  MAP  test  follows  on  average  for 


EJLip(h,|x)  • 

Substitution  of  Equation  (10)  into  Equation  (9)  yields,  upon  application  of  Bayes’s  theorem. 


(10) 


=  • . N 


(11) 


the  equation 


Hk)  =  p{HMHk)Pj,k,  fc  =  1, . . . ,  iv  ,  (12) 

where  pj^k  is  defined  in  Equation  (6).  Comparison  of  Equations  (5)  and  (12)  suggests  that  the  MAP 
test  estimate  equals  the  training-set-based  estimate  if  the  training  set  satisfies  the  equation  7j  = 
p{Hj).  This  condition  reflects  the  common-sense  belief  that  the  training  set  should  be  proportioned 
according  to  the  prior  probabilities  of  the  hypotheses. 

A  deterministic  neural-net  model  for  the  MAP  A'-hypothesis  test  occurs  if  the  A-deepest 
layer  neurons  output  analog  values  in  the  range  (0,1).  We  assume  that  for  net  input  xeV  the 
fth,f  =  1,...,A,  neuron  literally  outputs  the  value  p(A,|x).  The  MAP  test  then  results  simply 
from  choosing  the  hypothesis  A,  corresponding  to  the  deepest  layer  neuron  with  the  largest  output 
value.  A  schematic  of  the  deterministic  MAP  test  neural  net  is  shown  in  Figure  2.  In  order  to 
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compute  performance  probabilities  for  this  net,  we  must  define  regions  Bj,  j  =  given 


175324-14 


N-NEURON 

OUTPUT 


Figure  2.  Schematic  of  deterministic  neural-net  representation  of  MAP  test.  OBS- 
generated  input  x,  N -hypothesis  neuron  output. 


by  Bj  =  {xeV\p{Hj\x)  >  p{Hk\x),  V/c  ^  j).  Assuming  that  the  regions  of  equal  conditional 
probabilities,  Sj^k  =  {xeD\p{Hj\x)  =  p(i]rfc|x)},  j^k  =  have  zero  support,  we  define  the 

joint  performance  probability  by  the  expression 

=  pm  l^p{x\Hk)dx  ,  (13) 

corresponding  to  the  probability  that  the  jth  neuron  output  in  Figure  2  is  maximum  for  an  Hk- 
generated  input.  The  computation  of  the  performance  probabilities  j,k  = 

follows  from  the  application  of  Bayes’s  formula  in  Equation  (11)  to  the  definitions  of  regions  Bj  and 
In  Section  3,  Equation  (13)  is  applied  to  the  binary  hypothesis  test  on  uniformly  distributed 
data  for  comparison  with  the  training-set-based  estimates  in  Equations  (5)-(8). 
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3.  BINARY  HYPOTHESIS  TEST 


In  this  section  performance  measures  derived  in  Section  2  are  applied  to  a  binary  hypothesis 
test  of  uniformly  distributed  data.  The  averaged  probabilities  in  Equations  (8)  and  (13)  are  related 
to  parameters  in  the  hypothesis  probability  distributions.  This  relationship  defines  a  framework 
for  comparison  of  the  estimates  with  examples  of  adaptive  system  performance. 


3.1  Training-Set-Based  and  MAP  Estimates 


Consider  a  training-set-based  decision  between  hypothesis  Hq  and  Hi  with  prior  probabilities 
po  =  p{Ho)  and  pi  =  p(i/i),  respectively.  Assume  the  output  from  the  OBS  operation,  x,  has 
conditional  probabilities  p(a:[i/,),  i  =  0, 1,  for  phenomena  satisfying  hypothesis  if,.  The  system 
performance  is  defined  by  the  standard  conditional  probabilities  of  detection  Pj  =  p(ifilifi),  false 
alarm  P/  =  p(HilHo),  miss  Pm  =  p(HolHi},  and  the  correct  Hq  identification  PcHo  =  p(ffolffo)- 
Assuming  a  training  set  consisting  of  N,,  i  —  0,1,  trials  of  OBS  with  hypothesis  if,,  we  have,  from 
Equation  (8), 


Pd 


Pf 

Pm 


7lPl,l 

7iPi,i  +  7oPo,l 
7lPl,0 

7iPi,o  +  7oPo,o 
7oPo,i 

70P0,1  +  7lPl,l 


(14) 

(15) 

(16) 


and 


PcHo 


70P0.0 

70P0,0  +  7lPl.O 


(17) 


where  7;  =  iV,7(iVo  +  Ni),  f  =  0, 1,  and 


Phk  = 


f  p{x\Hj)p{x\Hk) 

JvPop{x\Ho)+Pip{x\Hir^^ 


0,1 


(18) 


with  V  the  region  of  possible  x-values. 

A  common  situation  that  occurs  in  the  conventional  Neyman  Pearson  test  is  the  existence  of  a 
maximum  tolerated  joint  false  alarm  probability  Pp  -  p{H\,Hq).  From  Equation  (5),  a  maximum 
joint  false  alarm  probability  Pfj,  implies  an  upper  bound  on  the  percentage  of  H\  trials  in  the 
training  set,  i.e.,  the  condition  71  <  PfoIpqPijd-  There  is  also  a  corresponding  upper  bound  on  the 
joint  detection  probability  Pp  =  p{H\,Hi),  given  by  Pd  <  PfoPiPi.i/PiPi.o- 
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The  MAP  test  performance  measure  in  Equation  (13)  can  also  be  applied  to  the  binary 
hypothesis  test.  Assume  that  for  xefo,!,  which  is  the  region  of  equal  a  'posteriori  probability,  the 
test  chooses  between  hypothesis  Hq  and  H\  with  equal  probability.  In  this  case  the  neural  net  has 
equal  output  values  from  the  two  deepest  layer  neurons  in  Figure  2.  The  expression  in  Equation  (13) 
is  easily  generalized  to  obtain  the  conditional  probabilities 

I  p{x\Hi)dx+l  f  p{x\Hx)dx  ,  (19) 

pMAP2  _  f  p(^x\HQ)dx  +  ~  [  p{x\Ho)dx  ,  (20) 

j  v(x\H,)dx^\f  p(x\H,)dx  ,  (21) 

JBq  ^  dSo,i 

and 

dx  +  ^  f  p{x\Ho)  dx  .  (22) 

JBo  ^  JSo.\ 


In  order  to  compare  performance  measures  in  Equations  (14)-(17)  with  the  MAP  estimates 
in  Equations  (19)- (22),  consider  the  case  of  uniformly  distributed  conditional  probabilities  p{x\H,) 
of  width  A,,  i  =  0, 1.  Figure  3  contains  uniform  distributions  p{x\H,),  i  =  0, 1,  normalized  to 
a  peak  value  of  1/A,,  centered  at  0  for  Hq  and  at  for  Hi.  The  distributions  are  overlapped 
under  the  condition  [Aq  -  Ai|/2  <  xi  <  (Aq  +  Ai)/2.  Substitution  of  the  uniform  distributions  in 
Equation  (18)  results  in  expressions  for  pij,  i,j  =  0, 1,  given  by 


(xi-b^)  Ai(^-xi) 

poAo  Ao(poAi  +  piAo) 


(23) 


P0,1  =  Pl.O  = 


i^-Xl) 
poAi  -bpiAo 


and 


(Xi  +  ^)  ^  Ao(^-xi) 
Pi^i  Ai(poAi -f  piAo) 


(24) 


(25) 
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Figure.  3.  Binary  hypothesis  test  on  uniformly  distributed  data.  Input  probability  distri¬ 
butions  p(x|/f,-),  i  =  0, 1.  Width  A,-,  i  =  0, 1;  center  0  for  Hq  and  xi  for  Hi . 


The  adaptive  system  simulation  in  Section  3.2  is  for  uniform  probability  distributions  of 
equal  width,  Aq  =  Ai  =  A,  separated  by  a:i  =  KA.  The  if-factor  parameterization  of  overlapped 
distributions  is  convenient  for  analysis  of  system  discrimination  performance  [15].  The  overlapped 
distribution  condition  corresponds  to  /fetO,  1],  with  K  of  unity  for  non-overlapped  distributions. 
The  training-set-based  measures  for  uniform  distributions  are  obtained  from  the  substitution  of 
Equations  (23)-(25)  into  Equations  (14)-(18),  with  the  result 


7i[i^  +  (l-iC)pi] 
7i/<'  +  (1  -  K)pi 


(26) 


p  =  7iPo(l  -  K) 

^  ^oK-t{1-K)po 


(27) 


p  =  70Pl(l  -  K) 

"  'nK-^{l-K)pi 

and 


(28) 
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^o[K  +  {l-K)po] 
^oK  +  il-K)po 


(29) 


The  MAP  test  measures  in  Equations  (19)-(22)  can  be  computed  for  the  uniform  data  dis¬ 
tributions  in  Figure  3.  Note  that  for  the  case  Aq  =  Ai  =  A,  xi  =  KA,  and  po  =  Pi  =  0.5,  the 
region  of  equal  a  posteriori  probability  £o,i  is  (A(i^  -  |),  A/2];  the  dominant  hypothesis  regions 
are  given  by  Bq  =  [-A/2,  A{K  -  |)]  and  Bi  =  (A/2,  A(if  +  5)].  Substitution  of  these  regions  into 
Equations  (19)-(22)  with  uniform  conditional  probabilities  p{x\Hi),  i  =  0, 1,  yields 

pMAP2  ^  pMAP2  ^  ^  (30) 


and 

pMAP2  ^  pMAP2  ^  .  (31) 

Note  that  for  the  case  70  =  po  =  0.5  and  71  =  pj  =  0.5,  Equations  (26)-(29)  and  Equations  (30) 
and  (31)  are  identical,  as  expected  for  a  training  set  proportioned  according  to  prior  probabilities. 
Note  that  the  condition  po  =  pi  implies  that  the  MAP  test  is  equivalent  to  the  maximum  likelihood 
test,  which  maximizes  p{x\Hj),  ;  =  1, . . . ,  A^,  to  determine  the  hypothesis. 

3.2  Performance  Estimate  Comparisons 

In  the  analysis  of  the  previous  sections,  performance  measures  for  an  adaptive  system  were 
obtained  from  the  statistics  of  the  training  set.  We  also  derived  estimates  based  on  the  assumption 
that  an  adaptive  system,  realized  as  a  neural  net,  performs  a  MAP  N-hypothesis  test.  Back- 
propagation  neural  nets  are  adaptive  systems  consisting  of  connected  layers  of  processing  elements 
(neurons)  with  adjustable  connection  weights  between  layers  and  adjustable  thresholds  on  each 
neuron.  A  training  set  for  decision  making  is  used  to  adjust  net  parameters  so  that  the  net 
performs  a  map  between  the  input  data  space  and  the  output  hypothesis  space.  Both  the  training- 
set-based  and  MAP  estimates  defined  in  the  previous  sections  involve  particular  assumptions  about 
the  adaptive  system.  The  training-set-based  estimate  assumes  that  the  system  power  of  association, 
i.e.,  the  ability  to  decide  on  data  not  trained  on,  does  not  affect  the  system  performance.  The  MAP 
estimate  for  neural  nets  assumes  that  regardless  of  training-set  composition  the  network  literally 
outputs  the  conditional  probabilities  p{Hi\x),i  =  1,...,N,  in  the  N-neuron  deepest  layer.  In 
this  section  we  compare  the  training-set-based-  and  MAP-performance  measures  on  the  binary 
hypothesis  test  in  Section  3.1.  The  performance  of  two  adaptive  systems,  a  linear  extrapolation 
from  the  data  set  (LINEXT)  and  a  back-propagation  neural  net  (BPNN),  are  compared  with  the 
two  performance  measures. 
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the  training  set  is  i/i-generated,  71  =  0.5,  the  training-set-based  probabilities  are  linear  in  K  and 
match  the  MAP  estimates.  The  results  in  Figure  4  indicate  that  a  training  set  proportioned  toward 
Hu  be.,  7i  >  7o,  increases  Pd  (at  the  expense  of  Pf)  over  the  MAP  test  estimate.  The  reverse 
situation  occurs  for  a  training  set  proportioned  toward  Hq. 

A  simple  adaptive  algorithm  for  hypothesis  testing  on  binary  hypotheses  is  linear  extension 
(LINEXT)  from  the  training  set.  Assume  a  training  set  of  OBS-generated  data  {a^i, •  ■  •  U 
where  xj  is  i/, -generated.  An  input  x  is  mapped  by  LINEXT  to  the  hypothesis 
of  the  nearest  element  of  the  training  set.  If  the  adaptive  system. is  viewed  as  a  map  ./  from 
V  to  {0,1},  with  /(x)  =  i  for  hypothesis  Hu  then  the  nearest-training-set- neighbor  algorithm  is 
simply  a  linear  extension  of  /  from  the  training  set.  The  hypothesis  chosen  for  input  x  results 
from  a  decision  threshold  at  0.5,  e.g.,  /(x)  greater  (less)  than  0.5  implies  hypothesis  Hi  {Hq).  The 
performance  of  the  LINEXT  algorithm  was  tested  by  creating  a  training  set  with  Nq  =  IOO70  and 
Ni  =  IOO71  Hq  and  ifi-generated  elements,  respectively.  We  considered  the  ceises  of  70  =  0.1 
and  0.2  separately  and  in  each  case  used  a  training  ensemble  consisting  of  1000  training  sets.  The 
LINEXT  algorithm  for  each  training  set  was  tested  with  an  independent  performance  set  of  400 
elements.  Each  performance-set  element  was  generated  by  first  choosing  Hq  or  Hi  phenomena 
according  to  po  and  pi  prior  probabilities.  The  chosen  hypothesis  H,  determined  the  distribution 
p(xlif,)  (Figure  3)  used  to  generate  the  data  value  xeP.  The  LINEXT  algorithm  was  applied  to 
xeV  and  the  output  hypothesis  Hj  was  compared  to  the  originating  hypothesis  H,.  The  number  of 
elements  mapped  to  Hj  originating  from  an  observation  of  Hi,  divided  by  the  number  of  elements 
in  the  performance  set  from  Hi,  yielded  the  conditional  probability  p{Hj\Hi).  Figures  5  and  6 
show  the  average  performance  of  the  LINEXT  algorithm  in  which  the  performance  set  probability 
estimates  described  above  were  averaged  over  the  training  ensemble  of  1000  sets.  Figure  5  contains 
a  plot  of  Pd,  Pf,  Pm,  and  PcHq  as  a  function  of  K  for  an  ensemble  of  training  sets  with  70  =  0.2 
and  71  =  0.8.  Note  that  the  experimental  performance  of  LINEXT  was  well  approximated  by 
the  training-set-based  estimate  (dashed  line),  particularly  for  Pd  and  Pm  probabilities.  As  seen  in 
Figure  6,  similar  results  were  obtained  for  an  ensemble  of  training  sets  with  70  =  0.1  and  71  =  0.9. 
Note  that  the  MAP  test  estimate  (dotted  line)  provided  a  less  successful  prediction  of  LINEXT 
performance. 

The  LINEXT  algorithm  above  performs  the  decision  space  mapping  on  the  training-set  ele¬ 
ments  exactly.  For  a  sufficiently  representative  training  set  in  the  overlap  region  of  Figure  3,  this 
necessitates  a  mapping  with  undulations  between  the  Hq  and  Hi  hypotheses.  A  three-layer  BPNN 
has  proven  sufficient  to  perform  any  reasonable  functional  mapping  [16].  A  rough  estimate  of  the 
required  number  of  neurons  is  obtained  from  the  BPNN  threshold  function, 


Te{I)  = 


1 

1  4-  exp(-/  4-  6) 


which  is  applied  to  the  input  7  of  a  neuron  with  threshold  value  6.  An  undulation  in  the  net 
input/output  mapping  is  easily  represented  as  the  difference  of  two  neuron  threshold  functions, 
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Figure  5.  Detection,  false  alarm,  miss,  and  correct  Ho  probabilities  versus  K  for 
LINEXT  algorithm  on  binary  hypothesis  test.  Averaged  over  1000  training  sets  with 
7o  =  0.2  and  71  =  0.8.  Prior  probabilities  po  =  Pi  =  0.5.  Each  trained  system 
performance-tested  with  4OO  elements. 
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Figure  6,  Detection,  false  alarm,  miss,  and  correct  Ho  probabilities  versus  K  for 
LINEXT  algorithm  on  binary  hypothesis  test.  Averaged  over  WOO  training  sets  with 
7o  =  0.1  and  71  =  0.9.  Prior  probabilities  po  =  p\  =  0.5.  Each  trained  system 
performance-tested  with  4OO  elements. 
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Te-Tei.  This  suggests  that  a  mapping  with  p  undulations  requires  at  least  2p  hidden-layer  neurons 
in  the  three-layer  net.  However,  a  BPNN  trained  to  exactly  perform  the  ftypothesis  space  map 
over  a  training  set  most  likely  approximates  the  training-set-based  performance  estimate.  A  neural 
net  with  performance  matching  the  MAP  estimate  is  trained  on  data  bicises  rather  than  on  an 
exact  training-set  map.  In  order  to  obtain  MAP  test  performance,  we  considered  a  BPNN  with 
a  single  input,  sixteen  hidden-layer  neurons,  and  two  output  neurons.  The  net  was  trained  to 
perform  the  binary  hypothesis  test  on  uniformly  distributed  data  of  equal  width  (Aq  =  Ai  = 
A)  and  equal  prior  probabilities  (po  =  Pi  =  0.5).  For  each  training  set  of  20  Hq-  and  80  Hi- 
generated  inputs  (70  =  0.2,71  =  0.8),  the  net  was  trained  to  map  to  (1,0)  and  (0,1)  for  Hq  and 
Hi,  respectively.  To  avoid  mapping  to  training-set  undulations  and  to  train  only  on  data  biases, 
the  inputs  from  the  overlapped  regions  in  Figure  3  were  removed  from  the  training  sets.  The 
performance  probabilities  for  the  trained  BPNNs  a.*-  a  function  of  K  are  shown  in  Figure  7.  For 
each  K,  ten  BPNNs  were  trained  on  independent  training  sets  of  20  and  80  Hq-  and  JTi-generated 
inputs.  For  each  trained  BPNN,  a  set  of  1000  four-hundred-element  performance  sets,  each  with 
equal  prior  probabilities  p,  of  0.5,  was  used  to  compute  performance  probabilities.  The  BPNN 
output  decisions  were  determined  by  the  larger  neuron  outputs  in  the  third  layer.  As  with  the 
LINEXT  algorithm,  the  conditional  probabilities  p(//,li/j)  were  determined  by  counting  the  number 
of  H,  decisions  from  /fj-generated  data  and  dividing  by  the  total  number  of  /fj-generated  elements 
in  the  performance  set.  As  demonstrated  in  Figure  7.  although  the  training  set  was  proportioned 
toward  Hi  (71  =  0.8)  the  BPNN  performance  was  best  approximated  by  the  MAP  estimate  (dotted 
line).  These  results  highlight  the  fundamental  difference  between  training-set-based-  and  MAP-test 
estimation  of  adaptive  system  performance. 
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Figure  7.  Detection,  false  alarm,  miss,  and  correct  Ho  probabilities  versus  K  for  BPNN 
algorithm  on  the  binary  hypothesis  test.  Averaged  over  10  training  sets  with  70  =  0.2  and 
7i  =  0.8.  Prior  probabilities  po  =  Pi  =  0.5.  Each  trained  system  performance-tested  with 
1000  sets  of  400  elements  each. 
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4.  CONCLUSION 


In  this  report  two  distinct  performance  measures  were  identified  for  adaptive  systems.  Training- 
set-based  3Stimation  of  system  performance  was  derived  from  the  statistics  of  the  training  set.  These 
statistics  are  relevant  if  system  errors  refiect  uncertainties  inherent  in  the  learning  procedure.  The 
measures  are  independent  of  a  particular  adaptive  system,  although  it  Wcis  argued  that  systems 
which  perform  training-set  map  undulations  are  described  by  training-set-based  estimates.  The 
training-set-based  measures  were  compared  to  the  performance  of  a  MAP  test,  which  is  easily  rep¬ 
resented  in  a  neural  net.  It  was  suggested  that  systems  trained  for  data  biases  rather  than  an  exact 
training-set  map  are  best  described  by  the  MAP  test  performance. 

Two  adaptive  systems  were  considered  to  emphasize  the  differences  between  training-set- 
based-  and  MAP-test  performance  measures.  The  LINEXT  algorithm,  as  applied  to  the  binary 
hypothesis  test,  performed  the  training-set  map  exactly.  It  was  experimentally  determined  that 
the  LINEXT  system  performance  was  well  approximated  by  the  training-set-based  estimate.  Al¬ 
ternatively,  it  was  shown  that  a  BPNN  trained  on  data  biases  had  a  performance  matching  the 
MAP  test  estimate. 

The  desired  system  performance  has  implications  for  neural-net  structure.  For  example,  it 
was  argued  that  two  neurons  are  required  in  a  three-layer  BPNN  for  each  implemented  undulation 
in  the  training-set  map.  An  adaptive  system  matching  MAP  test  performance  would  not  have  this 
structural  condition.  However,  training-set-based  performance  may  be  desirable  because  perfor¬ 
mance  probabilities  are  dependent  on  the  training  set.  For  example,  a  training  set  proportioned 
toward  particular  hypotheses  increases  the  system  performance  for  conditional  probabilities  involv¬ 
ing  those  hypotheses.  In  this  report  a  Neyman-Pearson-like  bound  for  the  binary  hypothesis  test 
was  shown  to  imply  an  upper  bound  on  71,  the  proportion  of  detection  data  in  the  training  set; 
the  adaptive  system  must  be  described  by  the  training-set-based  estimate  for  such  bounds  to  be 
relevant. 
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